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ABSTRACT 

ppdb (http://ppdb.agr.gifu-u.ac.jp) is a plant pro- 
moter database that provides information on tran- 
scription start sites (TSSs), core promoter structure 
(TATA boxes, Initiators, Y Patches, GA and CA 
elements) and regulatory element groups (REGs) as 
putative and comprehensive transcriptional regula- 
tory elements. Since the last report in this journal, the 
database has been updated in three areas to version 
3.0. First, new genomes have been included in the 
database, and now ppdb provides information on 
Arabidopsis thaliana, rice, Physcomitrella patens 
and poplar. Second, new TSS tag data (34 million) 
from A. thaliana, determined by a high throughput 
sequencer, has been added to give a 200-fold 
increase in TSS data compared with version 1.0. 
This results in a much higher coverage of 27 000 
A. thaliana genes and finer positioning of promoters 
even for genes with low expression levels. Third, 
microarray data-based predictions have been 
appended as REG annotations which inform their 
putative physiological roles. 

INTRODUCTION 

Gene regulation is a central part of morphogenesis and 
environmental adaptation of higher plants, and it is 
controlled by the promoter of each gene. Therefore, 



understanding of promoter structure is crucial to under- 
stand these fundamental processes of plants. 

There are three aspects to promoter structure: (i) the 
position, direction and strength of the transcription start 
sites (TSSs) that indicate actual promoter position; (ii) the 
type and position of the core promoter elements such as 
TATA boxes and Initiators (Inrs) that are thought to be 
the major determinants of the direction and position of 
promoters and (iii) the type and position of transcriptional 
regulatory elements that are involved in gene regulation. 

In our last report (1), we introduced the plant promoter 
database (ppdb), which provided promoter information 
about TSS clusters, core promoter elements [TATA 
boxes, Inrs, Y Patches, GA and CA elements (2,3)] and 
regulatory element groups [REGs, putative position- 
sensitive transcriptional regulatory elements that are ex- 
tracted by local distribution of short sequences (LDSS) 
analysis (2)] as putative and comprehensive sets of tran- 
scriptional regulatory elements. The database of the 
original version 1.0 contained information of two plant 
species, Arabidopsis thaliana and rice. 

MAJOR EXTENSIONS FROM VERSION 1.0 

The major amendment in version 3.0 is the addition of 
the Physcomitrella patens and poplar genomes to the 
database. The sources used for the information of the 
four genomes, including A. thaliana and rice, are shown 
in Table 1. The promoter elements of the moss genome 
have been extracted by the LDSS method (2). During 
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extraction, we noticed that considerable numbers of moss 
genes are driven by a similar type of promoter that is 
located within long terminal repeats. These promoters 
affect the extraction process due to tight sequence conser- 
vation that is not related to promoter function and for this 
reason they were excluded from the LDSS analysis. 
A. thaliana promoter elements have been applied to the 
poplar genome because the Brassicaceae and Malpighiales 
are phylogenetically close. 

A new function called 'Homologue Gene Search' has 
been added to facilitate the comparison of promoter struc- 
tures of orthologous genes within a species or between 
different species. Orthologue groups have been determined 
by Gclust, a system that classifies orthologues according 
to the presence or absence of protein motifs (16). 

New A. thaliana TSS data of 34 million tags, which 
corresponds to a ~200-fold increase in the previous 
data, have been added (Figure 1). REG annotations 
have also been appended and show functional predictions 
based on microarray data of responses to plant hormones 
(AUX: auxin, BR: brassinosteroid, CK: cytokinin, ABA: 
abscisic acid, ET: ethylene, JA: jasmonic acid, SA: salicylic 
acid), responses to a hormone-like chemical (H 2 0 2 ) and 



some environmental stress-related responses (drought, 
DREB1A overexpression) (7). Functional annotation of 
53 of 308 REGs is now available in version 3.0 (Figure 2). 

BROWSING PROMOTER STRUCTURE 

The major function of ppdb is to give an indication of a 
possible promoter structure for each gene in a genome 
based on the established lists of LDSS-positive elements. 
The information can be directly called by the gene ID 
(e.g. AT1G67090 or Os01g0791600), or selected from a 
list of 'Keyword Search' or 'Homologue Gene Search'. 
Pages for individual genes show the following informa- 
tion: (i) DNA sequence, (ii) TSS distribution (direction 
and strength at a 1-bp resolution), (iii) core promoter 
structure and (iv) REG data. 

At the sequence window, promoter elements including 
REGs and core elements are highlighted in a position- 
dependent manner as the default setting. Care should be 
taken that promoters without any TSS information do not 
show any elements as default. For an indication of the 
promoter elements of these genes, the 'Reliable' button 
should be clicked which changes the state to 'All' 



Table 1. Source of ppdb version 3.0 



Specification 



Source 



Size 



A. thaliana 

Genome sequence 
and gene annotation 
TSS information 



Promoter elements 



Rice (Oryza sativa) 
Genome sequence 
and gene annotation 
TSS information 
Promoter elements 



TAIR9 

Selected RAFL cDNA 

Cap signature CT-MPSS tags 

Oligo-Cap Illumina data 



A. thaliana LDSS-positive octamers 
Annotation for LDSS elements: PLACE 

Annotation for LDSS elements: 
stress and hormonal responses 

RGSP build 4.0 

Carefully selected fl-cDNA (from KOME) 
Rice LDSS-positive octamers 
Annotation for LDSS elements: PLACE 



Moss (P. patens) 

Genome sequence 

and gene annotation 

TSS information 

Promoter elements 
Poplar (Populus trichocarpa) 

Genome sequence Phytozome6 

and gene annotation 

TSS information FL-cDNA info from GenBank 



JGI version 1.1, COSMOSS VI. 6 
5' CAGE 

P. patens LDSS-positive octamers 



Promoter elements 
Orthologue gene 
Orthologue group 



A. thaliana LDSS-positive octamers 



Gclust 



http://www.arabidopsis.org/, (4) 

http://rarge.gsc.riken.jp/, (5) 
(3) 

Tokizawa M, Yamanaka H, 

Koyama H, Sakurai T, 

Kurotani A, Shinozaki K, Suzuki Y, 

Sugano S, Obokata J, 

Yamamoto YY (unpublished data) 
(2,3) 

http://www.dna.affrc.go.jp/PLACE/, (6) 
(7) 



http://rapdb.lab.nig.ac.jp/, (8) 

http://cdna01.dna.affrc.go.jp/cDNA/, (9) 
(2,10) 

http://www.dna.affrc.go.jp/PLACE/, (6) 



http://www.cosmos.org, (11,12) 
(13) 

This work 

http://www.phytozome.net/poplar, (14) 
(15) 



(2,3) 
(16) 



62 108 (clones") 
158 237 (tags b ) 
34 206 936 (tags b ) 



659 (octamers 0 ) 
21 (only matched 

motifs d ) 
53 (only matched 

motifs") 



17286 (clones 3 ) 
660 (octamers 0 ) 
4 (only matched 
motifs d ) 



1 122 382 (tags b ) 
198 (octamers 0 ) 



15 256 (clones 11 , 

BP921855-937111) 
36 103 (clones", 

DB874873-910976) 
659 (octamers 0 ) 

336 689 (families 0 ) 



"clone number, b tag number, °number of octamer sequences, d number of motifs and °number of orthologue families. 
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PlantPromoterDB promoter information of AT1G02780.1 



Summary of Gene (AT1G02780.1) 



Organism 



Chromosome 
Locus 



Description 



Ariibidopsis thulium; 



ATIG027liO TA1R NCB1 



AT1G027H0.1 



embryo defective 2386 (emb23fci6); 
FUNCTIONS IN: structural coastituent of 
ribosome; INVOLVED IN: embryonic 
development ending in seed dormancy, 
translation, ribosome biogenesis; LOCATED IN: 
in 6 components; EXPRESSED IN: 25 plant 
structures; EXPRESSED DURING: 14growth 
stages; CONTAINS lnterPro DOMAINVs: 
Ribosomal protein L 1 9/L 1 9e 
(InterPro:lPR000196>, Ribosomal protein 
L19/L19e, domain 3 (InterPro:lPR0l5974), 
Ribosomal protein Ll9/Ll9e, domain I 
{InterPro:iPRO 15972); BEST Arabidopsis 
thaliana protein match Ls: 60S ribosomal protein 
L19 (RPL19C) (TA1R:AT4G02230.I>; Has 865 
Blast hits to 865 proteins in 295 species: Archae 
■ 206; Bacteria - 0; Metazoa - 278; Fungi - 107; 
Plants - 94; Viruses - 0: Other Eukaryotes - 1 80 
(source: NCBI BLink). 
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Figure 1. Indication of individual promoters. An Arabidopsis gene, AT1G02780.1, is shown. The information is composed of five panels: 'Summary 
of Gene', 'Overview', 'Focused view' and also 'Promoter Summary' (not shown) and 'Other Reliable Promoter Summary' (not shown). The top TSS 
(TSS Peak) is shown in the second column of the 'Focused view' as white letters on a red background. New TSS tag data (34 million) are shown at 
the bottom of 'Focused view', highlighted in a red rectangle with rounded corners. 



(Figure 1, red arrow). This button is a toggle switch 
between 'Reliable' and 'All'. 'Reliable' is a default 
setting where only elements at appropriate positions 
relative to the peak TSS are detected. The setting 'All' 
removes the positional restriction as an indication of 
promoter elements, allowing global detection. The sensi- 
tive area in the 'Reliable' mode for each element group is 
described on the front page of the database. 

The 'TSS tag distribution' columns in the 'Focused 
view' provide the expressional strength of each TSS. The 
expression is the sum of six TSS tag libraries that are 
prepared from leaves, roots, inflorescences, etiolated seed- 
lings and shoots from low light-grown and high light- 
grown seedlings. 



The 'Core promoter information' table shows the 
presence or absence of core promoter elements (TATA 
boxes, Inrs, Y Patches, GA and CA elements). 

The 'REG information' table shows a REG list together 
with the corresponding PPDB motifs (2,3) and PLACE 
motifs (6). REG sequences, as well as PPDB and PLACE 
motifs, are linked to other pages containing biological in- 
formation. New REG annotations for A. thaliana obtained 
from predicted m-regulatory elements based on micro- 
array data (7) have been included (Figure 2). 

Selection of the 'All' button (Figure 1) adds another 
category, 'Not Reliable Promoter Summary' below 
'Other Reliable Promoter Summary'. This category can 
be used when searching for regulatory elements (REGs) 
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REG information 



Type Sequence f Annotation "^penome position 






trand Start End 


REG AAACGGCA 




21240505 21240512 




AtREG500 AAACGGCA 




PPDB 
lotif 


AAACGf C/G) 


PLACE 
Motif 




REG ATTGGCCCATCA 




21240556 21240567 




AtREG446 ATTGGCCC 


CK 


i>PDB 
lotif 


GCCCA 


PLACE 
Motif 


GGGCC 






AtREG484 TTGGCCCA 




PPDB 
lotif 


GCCCA 


PLACE 
Motif 


GGGCC. 
TGGGCY 


AtREG490 TGGCCCAT 




PPDB 
lotif 


GCCCA 


PLACE 
Motif 


GGGCC. 
TGGGCY 


AtREG403 GGCCCATC 




PPDB 
lotif 


GCCCA 


PLACE 
Motif 


GGGCC. 
TGGGCY 


AtREG635 GCCCATCA 




PPDB 
lotif 


GCCCA 


PLACE 
Motif 




REG AGTCGGTC 




21240575 21240582 




AtREG638 AGTCGGTC 


DREBlAox, ABA 


i>PDB 
lotif 


CCGAC 


PLACE 
Motif 


CCGAC. 
RCCGAC 


REG ACGCGTGT 




21240730 21240737 




AtREG536 ACGCGTGT 


Ad A. Drought 


i>PDB 
lotif 




PLACE 
Motif 


ACACNNG 


REG GGACACGTA 




21240785 21240793 




AtREG472 GGACACGT 


ABA, DREBlAox. Drought 


5 PDB 
lotif 


ACGT 


PLACE 
Motif 


ACGT. ACGTG. 
ACGTGKC . 




ACGTGTC 


AtREG557 GACACGTA 


ABA. DREBlAox 


i>PDB 
lotif 


ACGT 


PLACE 
Motif 


ACGT, ACGTG, 
ACGTGKC , 




ACGTGTC 


REG TGCCGTTT 




21240505 21240512 




AtREG50O TGCCGTTT 




i>PDB 
lotif 


AAACGrC/G) 


PLACE 
Motif 




REG TGATGGGCCAAT 




21240556 21240567 




AtREG635 tGatGGGC 




>PDB 
lotif 


GCCCA 


PLACE 
Motif 




AtREG403 GATGGGCC 




J PDB 
lotif 


GCCCA 


PLACE 
Motif 


GGGCC. 
TGGGCY 


AtREG490 ATGGGCCA 




PPDB 
lotif 


GCCCA 


PLACE 
Motif 


GGGCC. 
TGGGCY 


AtREG484 TGGGCCAA 




>PDB 
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PLACE 
Motif 


GGGCC. 
TGGGCY 


AtREG446 GGGCCAAT 


CK 


J PDB 
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GCCCA 


PLACE 
Motif 


GGGCC 






REG GACCGACT 




21240575 21240582 




AtREG638 GACCGACT 


DREBlAox, ABA 


>PDB 
lotif 


CCGAC 


PLACE 
Motif 


CCGAC. 
RCCGAC 


REG ACACGCGT 




21240730 21240737 




AtREG536 ACACGCGT 


ABA, Drought 


J PDB 
lotif 




PLACE 
Motif 


ACACNNG 


REG TACGTGTCC 




21240785 21240793 




AtREG557 tacgtgtc 


ABA, DREBlAox 


3 PDB 
lotif 


ACGT 


PLACE 
Motif 


ACGT, ACGTG, 
ACGTGKC, 




ACGTGTC 


AtREG472 ACGTGTCC 


ABA, DREBlAox, Drought 


PPDB 
lotif 


ACGT 


PLACE 
Motif 


ACGT, ACGTG, 
ACGTGKC, 


V J 




ACGTGTC 



Figure 2. REG information. REG information of the AT5G52310.1 (RD29A) promoter is shown. REG annotations, added in version 3.0, are 
highlighted in a red rectangle with rounded corners. 
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from wider regions or when there is no TSS information 
on the promoter of interest. 

ADDITIONAL PAGES 

A whole list of REGs for each of the genomes can be 
viewed by selecting a cell in the table of 'Index of Genes' 
at the top of the page. The lists present the relationships 
between REG ID, sequence, PPDB motifs, PLACE motifs 
and also functional annotations. Selection of a specific 
REG entry leads to 'Summary of the REG' and 'Entry 
Sequences' that show the whole gene lists containing the 
corresponding REG, together with gene annotations. 
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